The Legendre transform of a function $f(x)$ is a way to encode the same information but in another (maybe more convenient) way. Something similar to the Fourier transform: we encode the information of the function in the coefficients $C_{\omega}$ of the "armonics" $e^{i\omega t}$.
In the case of Legendre transform, we express the information of the function associating a value to every slope $p$ of the original function. By the way, this is only valid for a function whose slope has a 1-1 relation with the independent variable $x$, that is, $f'(x)$ must be injective. Usually we take $f$ convex, so no problem here.
The point is that in certain situations the slope $f'(x)=:p$ is more important than $x$ itself.
*Idea:* In a mountain cycling stage, we want information on altitude (because we struggle when we lack oxygen) as a function of the distance covered, and from there we can determine the slope (because it's tough, and we want to know which gears to use). So, we can say that at kilometer 8, the slope will be 6 percent at an altitude of 1250 m. However, for certain purposes, it might be more interesting to express the slope as the independent variable (for example, different slopes dictate the gear setup, but distances don't have the same effect...). We could express altitude as a function of slope, simply by clearing in the derivative and substituting in the original function. But in doing so, we would lose the information on distance! The solution is the Legendre transform.
In Lagrangian mechanics $\frac{dL}{dq_t}=p$ is the generalized momentum (that appears once and again), so is natural that we want to express $L$ as a function of $p$.
We could simply solve
$$\frac{dL}{dq_t}=p$$
for $q_t$ and find an expression for a new
$$\tilde{L}(q,p)=L(q,q_t(p))$$
but then we had lost information. From only $\tilde{L}$ we cannot recover the connection between $q_t$ and $p$. Think that $L$ has two good things: 1) it measure the cost for a generalized particle to pass for a point of the jet space (the sum of all these costs is what we call *the action* of the complete curve) and 2) it let us to connect $q_t$ and $p$. Instead, $\tilde{L}$ gives the *costs* but not the other thing.
The Legendre transform is a modification to $\tilde{L}$, say $H$, in such a way that we CAN recover everything. It is computed in such a way that $\frac{dH}{dp}=q_t$. But you can say: WTF, but $L$ measured something, what about $H$? The incredible thing is that $H$ is even more intuitive, is the energy; and, moreover, the least action principle is translated in this new language as the hamilton equations, that are far more intuitive.
One more reflection about Legendre transform. When we have a manifold, and a metric (or pseudometric) defined on the tangent bundle we can find a canonical isomorphism between every tangent space and its dual. This let us to lower the indexes for our convenience.
Legendre transform appears to me like doing something similar but for any function, not only a bilinear form, defined on the jet bundle (for example a complicated lagrangian).
I think it has to do with some kind of duality in distributions in jet spaces. See [Doubrov 2016] pages 6-7.
________________________________________
________________________________________
________________________________________
Author of the notes: Antonio J. Pan-Collantes
INDEX: